Benchmark GPT-tfjs #659

JulienVig · 2024-04-16T09:28:59Z

CLI benchmark of gpt-tfjs. Closes Benchmark gpt-tfjs integration #657
Fixes a bunch of issues as I uncover them, memory leaks, training loop end condition, clean the GPT code.
Bumps vue-tsc version

Training

Benchmark on a 2022 MacBook Air M2 with 16GB of RAM.
To reproduce, check out 58f018f and run npm -w cli run benchmark_gpt -- --contextLength 128 --batchSize 8 for example.

Time per token is obtained by measuring the time of 10 training update iterations and diving by (batch size * context length)
Memory values are the max memory allocated between the attention mechanism and the memory after computing the gradients. So far, the attention mechanism always had higher memory requirements. The actual peak memory allocated during training may be different but tfjs doesn't let us get this information easily.

I leave empty - cells where I deemed the benchmark too slow to perform. If needed, missing values can be extrapolated.

gpt-nano:

2.50M parameters
number of layers: 3
number of heads: 3
embedding dimension: 48

`gpt-nano`	`batch_size=8`	`batch_size=16`	`batch_size=32`	`batch_size=64`
`context_length=128`	0.53 ms/token 0.33 GB	0.60 ms/token 0.56 GB	0.82 ms/token 1.12 GB	1 ms/token 2.18 GB
`context_length=256`	0.53 ms/token 0.64 GB	0.68 ms/token 1.22 GB	1.06 ms/token 2.36 GB	1.81 ms/token 4.66 GB
`context_length=512`	0.72 ms/token 1.42 GB	1.08ms/token 2.75 GB	2 ms/token 5.42 GB	-
`context_length=1024`	1.24 ms/token 3.56 GB	2.10 ms/token 6.98 GB	-	-
`context_length=2048`	2.47 ms/token 10.2 GB	-	-	-

gpt-micro:

7.23M parameters
number of layers: 4
number of heads: 4
embedding dimension: 128

`gpt-micro`	`batch_size=8`	`batch_size=16`	`batch_size=32`
`context_length=128`	0.84 ms/token 0.6 GB	0.77 ms/token 1 GB	1.08 ms/token 1.86 GB
`context_length=256`	0.77 ms/token 1.1 GB	1.07 ms/token 2 GB	1.57 ms/token 3.8 GB
`context_length=512`	1.09 ms/token 2.3 GB	2.12 ms/token 4.4 GB	-
`context_length=1024`	2.11 ms/token 5.8 GB	-	-

gpt-mini:

12.32M
number of layers: 6
number of heads: 6
embedding dimension: 192

`gpt-mini`	`batch_size=8`	`batch_size=16`
`context_length=128`	1.3 ms/token 1 GB	1.12 ms/token 1.75 GB
`context_length=256`	1.17 ms/token 1.9 GB	1.43 ms/token 3.5 GB

gpt2:

number of layers: 12
number of heads: 12
embedding dimension: 768

`gpt2`	`batch_size=8`
`context_length=128`	7.56 ms/token 7.7 GB
`context_length=256`	6.83 ms/token 12.7 GB

Comparisons

Using the Python nanoGPT benchmark script on the same machine, I get the following comparisons between Python and JS:

`gpt-nano`	`gpt-tfjs`	`python` (nanoGPT repo)
`batch size=8` and `context_length=128`	0.53 ms/tokens	0.17 ms/tokens
`batch size=32` and `context_length=512`	2 ms/tokens	0.29 ms/tokens

Inference

Run npm -w cli run benchmark_gpt -- --inference --modelPath <path to trained model>
For gpt-nano trained with context length 128, inference time averages between 6 and 8 ms/token.

WebGPT reports 3 ms/token at 5M parameters, which is between gpt-nano (2.5M) and gpt-micro (7.2M). They also managed to scale up to 1.5B parameters on a M1 Mac with WebGPU.

martinjaggi · 2024-04-16T14:28:29Z

thanks, very interesting!
would you have a just very rough reference value also for the python equivalent (nanoGPT python)?

JulienVig · 2024-04-18T09:40:14Z

@tharvik I'd be curious to hear your opinion on a few things:

Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)
Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?

tharvik

superbe! that's very nice to have metrics on what's we're doing, thanks!

Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)

the nicest thing would be to be able to generate such metrics via the CLI, with the example output of the cmd being what you've here (not the tables I mean, but the same content).

Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?

not merging it means that it'll slowly drift off. I think adding the memory usage to the EpochLogs as you did is the way to go. see my comments related to it.

discojs/discojs-core/src/models/index.ts

discojs/discojs-core/src/models/model.ts

docs/examples/benchmark_gpt.ts

discojs/discojs-core/src/models/model.ts

docs/examples/benchmark_gpt.ts

discojs/discojs-core/src/models/gpt/model.ts

JulienVig · 2024-11-19T10:14:51Z

The memory values needs to be slightly updated when #807 is merged

martinjaggi · 2024-11-19T14:38:01Z

up or down? ;)

JulienVig · 2024-11-19T20:15:40Z

A very superficial benchmark showed 10-20% decrease in memory usage!

JulienVig added 2 commits April 16, 2024 11:05

Benchmark file

cb2ce82

Benchmarking commit

f0d7e7b

JulienVig added the documentation Improvements or additions to documentation label Apr 16, 2024

New benchmark commit

9826862

JulienVig added 6 commits April 18, 2024 13:53

Cleanup benchmark and add peak memory tracking

f635d2b

Fix training loop end condition mistake

fd8d352

Fix linting errors

be41396

Cleanup console.logs

3dd8d8d

Fix gpt test

c5a6674

Add benchmark PR link

cc83ad3

JulienVig marked this pull request as ready for review April 18, 2024 12:32

JulienVig requested a review from tharvik April 18, 2024 12:32

tharvik reviewed Apr 19, 2024

View reviewed changes

JulienVig added 5 commits April 20, 2024 16:53

Clean benchmarking code

f4784fc

Fix linting errors

d0f1322

Move benchmark to cli

a02e651

Update vue-tsc

58f018f

Make Model implement the Disposable interface

bcbd822

JulienVig requested a review from tharvik April 20, 2024 16:43

JulienVig and others added 3 commits April 20, 2024 18:47

Remove commented code

d99d6ad

Add help option to benchmark cli

d2a27f5

Update CLI README.md with benchmark information

8c97711

tharvik approved these changes May 1, 2024

View reviewed changes

JulienVig merged commit 2c7fb87 into develop May 1, 2024
23 checks passed

JulienVig deleted the 657-benchmark-gpt branch May 1, 2024 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark GPT-tfjs #659

Benchmark GPT-tfjs #659

JulienVig commented Apr 16, 2024 •

edited

Loading

martinjaggi commented Apr 16, 2024

JulienVig commented Apr 18, 2024 •

edited

Loading

tharvik left a comment

JulienVig commented Nov 19, 2024

martinjaggi commented Nov 19, 2024

JulienVig commented Nov 19, 2024

Benchmark GPT-tfjs #659

Benchmark GPT-tfjs #659

Conversation

JulienVig commented Apr 16, 2024 • edited Loading

Training

Comparisons

Inference

martinjaggi commented Apr 16, 2024

JulienVig commented Apr 18, 2024 • edited Loading

tharvik left a comment

Choose a reason for hiding this comment

JulienVig commented Nov 19, 2024

martinjaggi commented Nov 19, 2024

JulienVig commented Nov 19, 2024

JulienVig commented Apr 16, 2024 •

edited

Loading

JulienVig commented Apr 18, 2024 •

edited

Loading